CPS222 Lecture: Maps; Binary Search Trees               Last revised 1/25/2013

Objectives

1. To review the general concept of a map
2. To define binary search trees
3. To show how to perform operations on BST's

Materials

1. Code for BST algorithms to project

I. Introduction - Maps
-  ------------ - ----

   A. One kind of data structure that shows up in many places is some form of
      search structure, or map.  Conceptually, such a structure is a
      collection of key, value pairs that can be accessed by key.  
      
   B. Such a structure typically supports operations for insertion, lookup, and
      removal of entries - though in somoe problems the contents of the map
      may be fixed so that only lookup is needed (in which case a different
      implementation may be used.  These operations take the following form:
      
      1. Insertion:
                     __________________
        Key, value  | Map              |
        ----------> | (key,value pairs)|
                    |__________________|

      2. Lookup:
                     ___________
        Key         | Map       | Value
        ----------> |           | -------> 
                    |___________|

      3. Deletion:   ____________________________
                    | Map                        |
        ----------> | (key and its value removed)|
                    |____________________________|
                    
   C. There are actually quite a number of ways that such a structure can be
      implemented.
      
      1. Pile (unordered array).
      
         Since all operations are O(n), suitable only for small sizes - where
         its simplicity may actually make it desirable 
         
      2. Ordered array
      
         Insertion and removal are O(n), but lookup is O(log n) since binary
         search can be used.  Suitable only when contents are unchanging (e.g. 
         table of keywords) - where its simplicity may make it desirable.
         
      3. Linked list - unordered
      
         Insertion is O(1), but lookup is O(n).  Deletion is O(n) to find the
         "victim" but then only O(1) to actually remove it.  Suitable only
         in situations where insertion is the dominant operation (e.g.
         archival storage of information that is seldom actually referenced)
         
      4. Binary search tree - we will discuss today
      
      5. Hash table - we will discuss later in the week

II. Binary Search Trees
--  ------ ------ -----

   A. One of the two main ways to implement a map of significant size where
      modification is needed is to use a binary search tree.

   B. Definition: a binary search tree is a binary tree in which each node
      contains a value (called a key) that is a member of a well-ordered
      set.  Further, if p is a poiner to a node, then p -> _key >= every key 
      in the node's left subtree, and and p -> _key <= every key in the
      node's right subtree.

   C. Observe: if one traverses a binary search tree in inorder, the nodes
      are visited in ascending order of the keys.

        Ex:     DOG
               /   \
            BISON  FOX
            /   \
    AARDVARK   CAT

       is a binary search tree.  Its inorder traversal is:

        AARDVARK BISON CAT DOG FOX

II. Operations on Binary Search Trees
--  ---------- -- ------ ------ -----

   A. The utility of binary search trees comes from the fact that the
      operations of insert, lookup, and delete the node containing a certain
      key all take time proportional to the height of the tree.  

      1. If the tree is well balanced, then its height will be proportional to 
         the logarithm of the number of nodes.

         a. Observe that, in a perfect binary tree, there are twice as many
            nodes at each level as there are at the preceeding level (since each
            node has two children.)  Thus, the number of nodes in the
            tree grows as 2^height - which makes the height proportional
            to log number of nodes.  (You will develop a more formal proof
            of this for a homework.)

         b. If keys are inserted into a binary tree in random order, the
            resultant tree will not be perfect, of course; but the height will
            still be proportional to log n.  (This can be shown
            experimentally)

      2. To see the utility of this, we can compare the average number of steps 
         needed for various operations on various search structures, assuming 
         that, in each case, the structure contains 1000 elements:

        structure       insert          delete          lookup

        pile              1             500             500
        ordered array   500             500              10 [binary search]
        (unordered)       1             500             500
         linked list
        binary search    10              10              10
         tree - if
         balanced

   B. Algorithms for binary search trees:

      1. Finding a node containing a given key:

         PROJECT Code for lookup (recursive and non-recursive versions)

         Observe: This algorithm (in either form), requires a number of steps
                  proportional to the height of the tree.

      2. Inserting a new key - simplest form:

         PROJECT Code for insert (recursive and non-recursive versions)

         a. Observe: This algorithm (in either form), requires a number of steps
            proportional to the height of the tree.

         b. Observe that this insertion algorithm, while very simple, could
            lead to a highly non-optimal tree.

            Ex: Consider what happens if keys are presented in reverse order:

                FOX  DOG  CAT  BISON  AARDVARK

            But note that the same thing happens when they are presented in
            forward order!

                AARDVARK BISON CAT DOG FOX

         c. When we come to balanced binary search trees in a week or so, we 
            will see that there are several relatively simple way to avoid such 
            problems - leading to the ability to guarantee that the height of 
            tree will never be more than a fixed (small) multiple of log n.

      3. Deletion: This is a bit more complex than the other two operations.

         a. If the node we are removing has no children, it can be deleted
            and the pointer to it in its parent can be set to NULL.

         b. If the node has one child, that one child can become the child
            of the parent of the removed node (the grandparent adopts the 
            grandchild)

         c. But if the node being removed has two children, life is more
            complex.  Our basic goal is to guarantee that the resultant
            tree has the same inorder traversal as the original tree - minus
            the removed node.

                Observe: let the inorder traversal of the original tree be as
                         follows, where D is the node being removed, P is its
                         inorder predecessor, and S is its inorder successor:

                        ... P D S ...

                what we want is a tree that traverses as follows:

                        ... P S ...

                Observe: P is in D's left subtree, and S is in D's right
                         subtree.

                Observe: P cannot have a right child, and S cannot have a
                         left child.  (D is the inorder successor of P, and is 
                         above P in the tree.  If P had an rchild, P's inorder
                         successor would lie in P's right subtree.  A similar 
                         argument holds for S)

                Therfore: what we can do is arbitrarily choose either P or S,
                          copy its data up to node D, and then remove P or S
                          as the case may be.  (Since P and S have a maximum
                          of one child, removing either is less difficult.)

        PROJECT Code for remove (recursive implementation only)

        Observe: This algorithm requires a number of steps proportional to the 
                 height of the tree.